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We propose a privacy-enhanced matrix factorization recommender that exploits the fact that users can often 
be grouped together by interest. This allows a form of “hiding in the crowd” privacy. We introduce a novel 
matrix factorization approach suited to making recommendations in a shared group (or nym) setting and 
the BLC algorithm for carrying out this matrix factorization in a privacy-enhanced manner. We demonstrate 
that the increased privacy does not come at the cost of reduced recommendation accuracy. 
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1. INTRODUCTION 

In a classical recommender system a set of users rate a set of items and these ratings 
are then used to predict user ratings for those items which they have not yet rated, 
see Figure 1. The ratings supplied hy each user are tagged with the user identity, 
e.g. via a browser cookie, and so the ratings are individually identifiahle. This raises 
obvious privacy concerns, whereby an attacker observing the item ratings submitted 
by each user may learn information which the user does not wish to disclose. Such an 
attacker might observe the rating by sniffing the network path between a user and the 
recommender system or, more likely, may be the recommender system itself so that 
encryption of the network traffic does not constitute a defence. 

It is well known that users can often be grouped together by e.g. interest in sport, 
type of movie or choice of partner, and indeed this underpins most advertising cam¬ 
paigns. While such grouping is often carried out manually, there has also been interest 
in automated inference of abstract groups and it is such unsupervised automated in¬ 
ference of abstract groups that we pursue in the present paper. Namely, we start with 
the observation that the existence of abstract group structure raises the potential to 
use a group identity when submitting ratings to a recommender system instead of 
an individual user identity, and in this way provide a form of “hiding in the crowd” 
privacy. One key challenge, which we fully resolve here, is to provide accurate recom¬ 
mendations to a user without requiring the user to disclose their personal ratings to 
the recommender system. This is essential to protect against the type of privacy disclo¬ 
sure attack considered here. Another challenge is the selection of appropriate abstract 


(user, item, rating) tuples 


user 1 

user 2 


o 

o 


(1,1,1) (1,4,3) 


(2,2,5) (2,3,2) 


(a) 


> 


■> 


recommender 

system 


items 


users 


1 - 3 - 
- 52 - 

3-5 

3-4 


(b) 


Fig. 1. Illustrating classical recommender system setup. Each user submits ratings for a set of items, where 
this set may be different for each user, the ratings being tagged with the user identity, see left-hand plot (a). 
The submitted (user,item rating) pairs can be gathered into a matrix R, see right-hand plot (b), and the aim 
of the system is to predict the missing values, indicated by —. 
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groups that yield good prediction accuracy while being shared hy a sufficiently large 
number of users so as to provide a degree of privacy. In this paper we demonstrate that 
this is indeed possible for interesting, real data sets and consequently that the gain in 
privacy need not come at the cost of reduced recommender prediction accuracy. 

In more detail, we introduce a set of artificial pseudo-identities, referred to as prox- 
ynyms (or, in short, nyms), with which users access the recommending service. These 
nyms differ from ordinary pseudonyms in that the same nym is shared by many users. 
From the point of view of the system, therefore, it receives a sequence of item ratings 
from each of p nyms rather than from n users. To make this setup work two main tasks 
must be addressed: (i) each user must privately select an appropriate nym when sub¬ 
mitting ratings and (ii) the received collection of nym-item ratings must be capable of 
being used to make accurate predictions of the user ratings. Observe that these two 
tasks are coupled and must be solved in a joint manner i.e. the set of users selecting 
a nym must share enough in common to permit accurate predictions to be made when 
only nym-item ratings are observed. Further, unlike in the classical setup the same 
item may often be rated multiple times by a nym, since multiple users share the nym, 
and the nyms are not all equally “important” since one may be shared by many more 
users than another. 

In order to make recommendations we build on existing matrix factorization ap¬ 
proaches in which the matrix R of user-item ratings (see Figure 1(b)) is decomposed 
as a product where the row dimension of matrix U and matrix V is much less 

than the numbers of users and items. The user-nym selection and nym-item matrix 
factorisation tasks can be viewed as decomposing the nx m user-item rating matrix R 

as P U V where n x p matrix P maps from users to nyms. The elements of P are 
{0,1} valued and P is column stochastic (the columns sum to one) so that each user is 
a member of a single nym (extension to multiple nym membership is of course possi¬ 
ble). One of our main contributions is the observation that by use of the BLC algorithm 
it is in fact possible to carry out this P, U, V decomposition in a privacy-enhancing 
manner, without the need for sophisticated cryptographic methods. 

At the cost of introducing a more complex factorisation task, we find that prediction 
performance competitive with the state of the art can be obtained using this approach, 
especially when the user rating matrix R is sparse (as it usually is). That is, the in¬ 
creased privacy does not come at the cost of reduced recommendation accuracy. Indeed, 
we show that it can yield improved accuracy over classical matrix factorisation. As al¬ 
ready noted, this is perhaps unsurprising since, when nyms are chosen appropriately, 
users sharing a nym can usefully leverage their shared ratings/preferences in a more 
direct way than in classical collaborative filtering, where such leverage can become 
apparent only when the dimension of the latent space is very small. Further, we find 
that the projection P provides a useful form of regularisation that greatly reduces the 
sensitivity of performance to the choice of hyperparameters. While more computation¬ 
ally demanding that the standard matrix factorisation approach, BLC is still efficient 
and highly scalable. 

In summary, the main contributions of the present paper are as follows: (i) a novel 
matrix factorization approach suited to making recommendations in a shared nym 
setting, (ii) the BLC algorithm for carrying out this matrix factorization in a privacy- 
enhanced manner and (iii) its performance evaluation using a mix of both real and 
synthetic data sets. Although not the focus of the present paper, we also note that a 
relevant feature of our approach is its potential for backward compatibility with many 
existing recommender systems, and so its potential suitability for incremental roll-out. 
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2. RELATED WORK 

The potential for privacy concerns in recommender systems is well known, e.g. see 
[Lam and Riedl 2004; Calandrino et al. 2011; Shyong et al. 2006; Ding et al. 2010; 
Narayanan and Shmatikov 2006] and references therein. 

Many existing approaches to making private recommendations are based on encryp¬ 
tion and multi-party computation [Guha et al. 2011; Li et al. 2011; Aimeur et al. 2008; 
Nikolaenko et al. 2013; Canny 2002] and are essentially clean-slate designs (not hack- 
ward compatible with existing recommender systems). In [Nikolaenko et al. 2013], 
a trusted cryptographic service provider performs an encrypted version of the ma¬ 
trix factorization task used to calculate recommendations. In [Aimeur et al. 2008], the 
user data is split between the merchant and a semi-trusted third party; if these two 
actors do not collude, the cryptographic system proposed allows user recommenda¬ 
tions to be calculated without exposing private information. In [Li et al. 2011], content 
ratings and item similarity data are determined via distributed cryptographic multi¬ 
party computation, recommendations are then generated based on interest groups and 
further personalised at each user’s local machine. In the special case where the recom¬ 
mender system’s task is to select which adverts to display, another approach [Guha 
et al. 2011] is for the web system to supply a large set of possible adverts to the user 
and for the user’s browser to then privately make the recommendation decision as to 
which adverts are actually displayed. In [Canny 2002], a privacy-preserving scheme 
for a Singular Value Decomposition (SVD) based CF algorithm is introduced. 

Another strand of work on privacy-enhanced recommender systems improves pri¬ 
vacy by perturbing ratings with noise. Initially proposed by [Agrawal and Srikant 
2000] it is applied to collaborative filtering by e.g. [Huang et al. 2005; Kargupta et al. 
2003; Polat and Du 2005]. 

Perhaps the closest to the present work is that of [Shokra et al. 2009] who propose 
a peer to peer system for submitting aggregated ratings to a central recommender 
system. In [Nandi et al. 2011] a middleware framework for supervised aggregation 
is discussed but recommender performance is not considered. In [Narayanan and 
Shmatikov 2006], a technique the distributed aggregation of online profiles is pre¬ 
sented, but it requires a certain level of trust and co-ordination between users, and 
being peer-to-peer, privacy and recommendation quality strongly depend on the user 
connectivity. 

Clustering of users in recommender systems in general is not new. In [Ungar and 
Foster 1998; Xue et al. 2005; Hofmann 2004] a range of techniques (K-means and 
Gibbs sampling, a cluster-based smoothing system, and a cluster-based latent seman¬ 
tic model respectively) are proposed which cluster users and items in an unsupervised 
manner. However, these are not suited to implementation in a distributed and private 
manner. Clustering with matrix factorization methods has received much less atten¬ 
tion to date, with the notable exception of [Ding et al. 2006], and subsequent work 
building on this, in which an orthogonal non-negative matrix tri-factorization method 
is introduced. In [Zhu et al. 2014] it is observed that users with a few ratings can 
leverage the presence of users with many ratings to improve prediction accuracy, and 
in [Xin and Jaakkola 2014] considers a semi-private variant where private users can 
leverage the presence of public users with many ratings. 

Finally, even if BLC deals with groups, it differs (in scope and technical construc¬ 
tion) from traditional group recommender systems. Such systems were originally con¬ 
ceived [O’Connor et al. 2001] for social contexts in which people operate in groups (e.g. 
deciding which movie to see together, or which restaurant to attend). Their goal is 
to provide suggestions suitable to all members of the group. As comprehensively sur¬ 
veyed in [Boratto and Carta 2011; Kompan and Bielikova 2013], existing work differs 
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Fig. 2. Data flow diagram of the system scenario with two possible points of attack. 

in the use of user preferences and/or social information, and in how such information 
is accounted when making recommendations for groups (or even settling disagree¬ 
ments between group members [Shang et al. 2011]). BLC is not a group recommender 
in the meaning above; it is an individual matrix factorization recommender that ad¬ 
ditionally clusters users into groups to improve the recommender’s accuracy. In this 
sense, and putting aside our main privacy goal, BLC may be considered closer, in its 
scope, to content-boosted [Nguyen and Zhu 2013] or category/topic-based [Zhou et al. 
2014; Wang et al. 2012] recommenders, and to a greater extent closer to information 
matching approaches [Gorla et al. 2013], which show that (probabilistic) matrix fac¬ 
torization approaches can be improved by adding information on content categories 
or user groups (which are built using user attributes or features). However, we stress 
that, unlike these works, we do not exploit any exogenous information such as user 
preferences of item categories, and we do not boost the recommender with an a priori 
computed group structure, but we try to learn groups through the own users’ ratings, 
and we do this while performing the matrix factorization. 

3. THREAT MODEL 

The privacy disclosure scenario we consider arises naturally from attacks which have 
already occurred on production recommender systems [Narayanan and Shmatikov 
2006]. Figure 2 illustrates the setup. A user sends encrypted ratings and receives rec¬ 
ommendations through an authentication based system. The user ratings are kept in 
the system storage, where more manipulation is needed to generate the recommenda¬ 
tions. The recommender system can also decide to release an anonymised/aggregated 
version of the database. We imagine two main types of attack/threats^: 

(1) The attacker has access to a leaked version of the user ratings database held by the 
recommender system. 

(2) The recommender system can decide to release an anonymised version of the user 
ratings database. 

It is important to notice that the sensitive asset here is the stream of pairs of unique 
IDs and ratings. Knowledge of these potentially places two sorts of information at 
risk: (i) the mapping from ID to true user identity (which may be of interest in its own 
right) and (ii) the ratings submitted by the user being targetted for attack. When a 


^Note that we do not consider attacks that aim to compromise the user client/browser or attacks against the 
encryption on the link between user and server, both of which are already much studied. 
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single user is associated with each ID, then since the rating profile associated with 
each ID is often unique there is a risk that the association of a user with an ID may he 
inferred hy de-anonymisation using external data [Datta et al. 2012; Aggarwal 2005]. 
In this case hoth the mapping from ID to user identity and the ratings submitted hy a 
user are then simultaneously compromised. 

With this kind of attack in mind, our interest is instead in the situation where mul¬ 
tiple users are associated with each ID. This changes matters significantly. Firstly, 
consider an attack that aims to learn the mapping from ID to true user identity. When 
an ID is shared hy k users then the collection of (item, rating) pairs associated with 
that ID is now a mixture of ratings from the k different users. Provided the k users 
sharing an ID submit sufficiently different sets of (item, rating) pairs then this attack 
can be expected to involve a harder inference task than when an ID is used by a single 
user. This is because the (item, rating) pairs submitted by the other users sharing the 
ID now act as “noise” that tends to mask the pattern of (item, rating) pairs submitted 
by the user being targetted. Of course, if the users sharing an ID submit sets of rat¬ 
ing which are too similar to each other then the protection against de-anonymisation 
provided by this direct mixing mechanism may be reduced, in which case additional 
measures can be envisioned e.g. inserting dummy ratings or other “noise” to increase 
diversity. 

Secondly, when multiple users are associated with each ID then linking a user to 
an ID no longer immediately reveals that user’s ratings, or even the more limited in¬ 
formation consisting of the set of items rated by that user^. To see this, suppose for a 
particular item that only n of the k users sharing an ID have rated that item. Then 
an attacker does not learn whether the user of interest belongs to the set of n users 
who have rated that item or the set of k — n users who have not. It is only when 
all users sharing an ID rate a particular item, i.e. n equals k for that item, that the 
attacker learns that the target user has rated that item. Even in that case the at¬ 
tacker does not learn the rating submitted by the user unless all users sharing the ID 
also rate the item identically. This direct “hiding in the crowd” mechanism might be 
further strengthened, if needed, by users additionally submitting a number of auto¬ 
mated/dummy (item,rating) pairs selected at random - then, even when an item rated 
by all users sharing an ID, any individual user has a degree of plausible deniability in 
that they can claim that their rating for that item was an automated/dummy one. 

It can be seen that associating multiple users with each ID in a recommender sys¬ 
tem not only potentially makes the considered de-anonymisation attacks harder to 
carry out but also creates the foundation upon which a number of still stronger mech¬ 
anisms for enhancing privacy can be built. Supporting shared IDs, which we refer to 
as nyms, while still providing useful recommendations is also the main technical chal¬ 
lenge (inserting dummy ratings etc is relatively straightforward) and in the following 
sections we demonstrate that this can indeed be achieved. 

3.1. Dishonest Users and Sybil Attacks 

Attacks by dishonest users who submit false ratings in an attempt to manipulate the 
recommendations made by the system are outwith the scope of the present paper. Of 
course this is an important challenge for all recommender systems, but it is not specific 
to the approach presented here. That said, the use of shared IDs/nyms does potentially 
facilitate Sybil attacks and so we briefly describe one mechanism, based on the work 


^While an attacker might potentially also be interested in which items have not been rated by a user, in a 
recommender system the ratings by a user are usually extremely sparse, i.e. the number of available items 
is much larger than the number of ratings submitted by any individual user. Absence of a rating is therefore 
typically much less informative than the presence of a rating. 
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of [Chaum et al. 1990], by which such attacks can be disrupted while making use 
of nyms. In summary, each user mints a number of session tokens (with associated 
serial number), blinds them with a secret blinding factor and forwards them to the 
recommender system through a non-secure channel. The number of tokens available to 
a user is limited e.g. by requiring users to authenticate or make payment to the service 
in order to forward a token, or perhaps by limiting the number of tokens allowed within 
a certain time window Note that during this phase the user might be identified to the 
system, e.g. to make a payment. The system then signs the tokens with its private key, 
without knowledge of the serial number associated with the tokens. On receiving the 
signed tokens back from the recommender system, the user can remove the blinding 
factor and use the tokens to submit ratings to the system anonymously. Double use of 
tokens is prevented by the system maintaining a database of the serial numbers of all 
tokens that have been issued. 

4. MAPPING USERS TO NYMS 

BLC decomposes the user-rating matrix Ras P U V. The primary challenges are (i) 
finding an appropriate assignment of users to nyms i.e. the selection of matrix P and 
(ii) the calculation of this assignment in privacy-enhanced manner. 

4.1. Statistical Model 

As in classic matrix factorization approaches, let vector Uu £ associated with user 
u capture the users preferences in the latent feature space of dimension d, and gather 
these vectors together to form matrix U G Similarly, let vector Vy G asso¬ 

ciated with resource v capture its features in the latent space and we gather these 
vectors together to form matrix V G The rating of item v by user u is Gaussian 

random variable with mean U'^Vv and the matrix of ratings of all items by all users 
is therefore a Gaussian random variable with mean U^V. 

Let us now depart from the usual setup by further assuming that users belong to 
distinct groups. 

Assumption 1 (Nym Decomposition). Matrix U can be decomposed as U = 
tjP where U G and P G Rp^^. 

We can think of vector IJg e g = 1, - ■■ ,p as the preference vector associated 
with a group of users. The preference vectors of the users belonging to that group can 
then be viewed as random variables with mean IJg. We refer to each group g as a nym. 
Matrix P then maps from nyms to users, with u’th column P^ G KP of P defining the 
mapping from the nyms to user u. 

The rating supplied by user u for resource w is a Gaussian random variable G M 
with mean U^Vy = {UPu)'^Vv and variance That is, 

Prob{XR^^ = Ruy\U,V-P) ~ 

where (j)uv{Ruv) ■= {Ruv — Py ^ Vv) ■ Gathering the user ratings into random matrix 
Xz G let O c {I,-- - ,n} x {I,-- - ,m} denote the set of user-resource rating 

pairs that are contributed by the users. Letting Zq = {Xr^^, {u,v) g O} and IZo = 
{Ruvj {u,v) G O}, the conditional distribution over these observed ratings is, 

ProbiZo = no\U, V; P) := piTZo\U, W; P) ~ 

{u,v)GO 
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The posterior distribution is 

p{T^o) 

and so the log-posterior is 

logp{U,V\TZo;P) = -\ y (t>nv{Ruv)-^trU'^U - ^trV'^V+ C (1) 

(7^ ^^ (J- Gtr 

{u,v)^0 U ^ 

where C is a normalising constant and we have assumed Gaussian priors for U and V 
with zero mean and variance and cr^, respectively. 

4.2. Private Nym-Based Matrix Factorization 

Use of nyms aside, the log-posterior expression (1) is of the standard form widely used 
in the matrix factorization recommender literature. To find the matrices U, V max¬ 
imising \ogp{U,V\TZo\P) requires knowledge of the rating made by each user 
and when this can be linked to the user identity, e.g. via a cookie, then this is evi¬ 
dently non-private. However, when the user ratings possess a nym structure we have 
the following key observation: 

Lemma 4.1 (Equivalence of log-posterior). Let 
U{v) = {u : (m,u) G O} be the set of users rating item v, and V = {w : (w, w) £ 0,u e 
{I,-- - ,n}} the set of items for which ratings are observed. Suppose matrix A(ti) := 

J2ueu{v) non-singular. Then matrices U, V maximise log-posterior (1) if and 

only if they maximise 

\ogp{U,V\'Ro\P), i-e. 

aj:gmax\ogp{ty,V\TZo', P) = argmaxlogp(Li, UITJci; P) (2) 

u,v u,v 

where 

\ogp{Uy\no-,P) = ~'^{Rv-uW^fK{v){R^-uW^)- -^tru'^u- \trV^V ( 3 ) 

«ev ^iJ 

and Ry - A ('C) RuvPu' 

Proof. Letting ip := — \-trU'^U - h-trV'^V, then 

argmaxlogp(i7, V|77c>; P) = argmax y {Ruv — {UPVvff (4) 

u,v u,v , , 

(u,v)€0 

'=^argmax y -2Ryy{UPnfVy + {{UPufVyf + 

U V ^ 

{u,v)€0 

= argmaxy y -2RyyPZu^ Vy + VlUPuPfU^ Vy + 

U V ^^ ^^ 

v^Vu^U{v) 

where (a) follows from the fact that does not depend on U or V. Hence, letting 
Ry := J2ueu{y) RuvPu and using the fact that A{v) is non-singular, we have that 

argmaxlogp(i/,V|Po;P) = argmaxy-2yt7^U„-hyt/ y PuPfU^Vy+'P 

U V U V 

’ ' uev u€U(v) 

= argmaxy-t (u'^Vyf A{v)u'^Vyp (5) 
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= argmaxy^ Ry (A(ti) Rv — 2Ry U Vv 

v^V 

+ {U^V^f X{v)Ty'^V„+i) ( 6 ) 

= argmaxlogp(C/, V|7?,c); P), (7) 

u,v 

where (6) follows from the fact that {A.{v)~^)'^ Ry does not depend on UorV. □ 

The condition that A (a) is non-singular requires that the set of users rating item v is 
non-degenerate in the sense that every nym is used hy at least one user (no zero rows 
in A{v)) and the vectors of user assignments to each nym are linearly independent (the 
nyms are distinct). 

By Lemma 4.1, matrices U, V maximising logp(t/, V\lZo\P) in (3) also maximises 
the log-posterior (1). As we will discuss in more detail shortly, the importance of 
Lemma 4.1 is that finding matrices U, V maximising (3) only requires observations of 
nym ratings Ry and matrices A(a), V v. That is, does not require that individual user 
ratings are observed by the recommender system. 

Of particular interest is the special case where users are each assigned to a single 
nym, and so the columns Py of matrix P have a single non-zero element (corresponding 
to the nym chosen) equal to 1. In this case matrix A{v) is a diagonal matrix with 
element A{v)gg equal to the number of users who rate item v using nym g and nym 
rating vector Ry has element g equal to the average of the ratings for element v by 
users in nym g. Using Lemma 4.1 we then have: 

Theorem 4.2 (Private Nym Factorization). Suppose A{v) is diagonal. Let U 
and V satisfy 

= E (8) 

vGV \ 0 wGV / 

vl = R!yA{v)u'^ + UA{v)u'^'^ . (9) 

Then U,V is a stationary point of (3). 

Proof. Recall that trV^V = Y^yY^i{VyiY and so the derivative with respect to 
vector Vy is TV'^. Observe also that the derivative of {Ry — fj Vy)'^A{v){Ry — U Vy) 

with respect to vector Vy is —2UA{v){Ry — U Vy). Hence, differentiating (3) with 
respect to Vy yields 

\uA{v){Ry - U^Vy) - -^Vy (10) 

Setting this equal to zero and rearranging 3delds (9). Similarly, differentiating (3) with 
respect to Ug and setting equal to zero yields (8). □ 

It follows from Theorem 4.2 that to determine the matrices U, V maximising 
logp(U, VjTlo', P) all that it is required for the recommender system to know is (i) 
the average ratings for each nym-item pair (namely, matrix R) and (ii) the number of 
users in each nym who rate item v (namely, A{v)). There is no need for users to reveal 
their ratings in an individually-identifiable way. 
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Regarding the nym to user mapping P, observe that the log-posterior is separable 
in the columns of P, 

n:mxlogp(t/', y[7^e>; P) = nun ^ 

{u,v)GO 

= E iRuv-Plu^v,)^ (11) 

where U = {« : {u,v) G 0,v G {I,-- - ,m}} is the set of users providing ratings and, 
V(m) = {w : (u, v) G O} is the set of items rated by user u. Hence, we can find mapping 

P for each user u individually by solving min^j^ J2vev{u)iRuv — PuU Vy) . Provided 

the nym-item factorization matrices U, V are made available to users by the recom- 
mender system, this optimisation can be carried out privately by each user on their 
own computer using their locally stored ratings Ruv There is no need to release the 
individual user ratings to the recommender system or to other users. 

4.3. BLC Algorithm 

The foregoing observations suggest an iterative algorithm that seeks to maximise the 
log posterior (1) by alternating between the following two steps: 

(1) Using the current estimates for the matrix of average nym-item ratings R and the 
number A(w) of users in each nym who rate item v, estimate U, V via Theorem 4.2. 

(2) Given the current estimates for U, V each user u privately updates their column 

Pu in P by solving “ ^uU V where I = {ei,i = l,p}, e* 

the vector for which element i equals 1 and all other elements equal to 0. This opti¬ 
misation can be trivially solved by simply calculating the objective for each element 
in (small) set 1 and selecting the lowest valued. The updated R and A{v) are then 
shared with the recommender system. 

This two-stage process is given in more detail in Algorithm 1. At each iteration, each 
vector Ug of nyms g that are used by at least one user is updated using the current 
estimate of V. Then V is updated using the current value of TJ. This is repeated until 
the improvement in the log-posterior falls below tolerance e. The users then update 
their choice of nym privately in a distributed manner and R, A (a) are then updated 
accordingly. 

4.4. Convergence 

Since each update in Algorithm 1 is either a descent step or minimisation step, the 
algorithm is convergent and, in particular, we have: 

Theorem 4.3 (Convergence of BLC Algorithm). Suppose U and V remain 
bounded. Then the sequence generated by the alternating updates (8) and (11) conver¬ 
gences to a point of — logp{Uy\lZo\P) that is stationary for TJ and V and a local 
minimum for P. 

Proof. See Appendix. □ 

4.5. Discussion 

4.5.1. Privacy-Preserving Calcuiation of R, A{v). Once users have updated the columns of 
P, the updated values of R, A{v) need to be shared with the recommender system. 
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Algorithm 1 BLC Nym Recommender 

loop 

Initialise E, A ^ oo, U <— A/’(0, a), V A/’(0, a) 
while EA > e do {Block 1. System factorization} 
for g = I,... ,p do 

> Othen 

A{v)ggRg,Vl fej+E Aiw),gV^vi) 

end if 
end for 
for all w £ V do 

Vl ^ illA{v)U^ + UA{v)U^y^ 

end for 

E ^\ogp{U,V\no\P) 

\E-E\,E^E 

end while 

for all M G W do (Block 2. User private nym choice} 
Pu ^ minx„eiEt,gv(«)(-^«« “ Xlu'^Vy)^ 

for all V G V(u) do 
A{v) = Euew(D) PuPu 

end for 

Rv = A~^{v) E„ew(„) RnyPu^V G V 

end for 
end loop 


This can he readily carried out in a privacy preserving manner and without the need 
for communication amongst the users. 

To see this, consider first a simplified hatch update where at each iteration users re¬ 
submit their ratings using their current choice of nym. Assuming users access the sys¬ 
tem via an appropriate anonymising connection, e.g. via Tor [Dingledine et al. 2004], 
then this update need not reveal the user identities. Given these ratings, the matrix 
R. of average item ratings hy each nym can he immediately determined hy the rec¬ 
ommender system. The number of users sharing each nym can also be estimated, e.g. 
by counting the number of ratings for each item (assuming that each user submits at 
most one rating for each item, multiple ratings for an item provide an indication of 
the number of users sharing the nym). The extension of this to online submission of 
ratings is now straightforward. This setup is illustrated schematically in Figure 3. 


4.5.2. Scalability. The BLC algorithm has favourable characteristics from the point of 
view of scalability. Firstly, the nym-item rating matrix R which needs to be factorized 
is of dimension p x m. In comparison, the user-item rating matrix R is of dimension 
n X m. Hence, the work scales with the number of nyms p rather than the number of 
users n and we expect p n (as we show in Section 5, in real scenarios n > 70 000 
and p < 200). Secondly, the nym-user mapping matrix P is of dimension px n but each 
column is updated by a separate user in a fully distributed manner, with no message 
passing or other co-ordination required between users. Hence, the update of P is of the 
“embarrassingly parallel” type [Srirama et al. 2012] and so highly scalable. 


ACM Transactions on Information and System Security, Vol. V, No. N, Article A, Publication date: January YYYY. 




A:11 


user u 


system 

1. Privately select nym 


1. Estimate A(v) 

P,, arg minS 

U, V 

2. Calculate R 


2.Submit ratings 

(rating, nym) pairs 

3. Factorize R as U^V 
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Fig. 3. Illustrating partitioning of operations in Algorithm 1 into a private user component and a public sys¬ 
tem component, with exchange only of nym-related information between both (no exchange of individually- 
identifying user data). 

5. EXPERIMENTS USING SYNTHETIC DATA 

We begin by using synthetic data sets where, by construction, we know ground truth 
regarding the number of nyms and so can evaluate the performance of the BLC algo¬ 
rithm against this. The setup considered in this section is intentionally tightly con¬ 
trolled so that we can vary one aspect at a time and study its impact. The performance 
with real ratings data will be considered later. We compare BLC with the classic matrix 
factorization algorithm [Koren et al. 2009]. 

5.1. Synthetic Data 

To generate a data set with p user groups we i) select p points in the d dimensional 
feature space as “group centres”, and ii) randomly draw n/p simulated users for each 
group, with position drawn from a Gaussian distribution with mean equal to the group 
position, and variable standard deviation. For instance, with p = 5 and a standard 
deviation equal to 0.01, the users are tightly clustered around each of the group centres 
as illustrated in Figure 4a. We generate item matrix V by drawing its elements from a 
Gaussian distribution, and thus obtain the full user-item matrix R as U^V. We then 
remove entries u.a.r. from R to reach a desired degree of sparsity. 

5.2. Performance vs Number of Nyms 

Figure 4b shows the Root Mean Square Error (RMSE) of the predicted ratings obtained 
using the P, U, V matrix factorization found by the BLC algorithm vs the number of 
nyms. The data shown is the average over 50 datasets drawn randomly as described 
above and error bars indicating one standard deviation are indicated. In this first ex¬ 
ample there are m = 100 items, the latent feature space is of dimension d = 4, there 
are n = 10 000 users, p = 5 nyms and the users are clustered with standard deviation, 
10“"^ around the nym centres (i.e. tightly bunched) and 50 % of the user-item ratings 
are missing values. It can be seen that as the number of nyms used by the BLC algo¬ 
rithm is increased there is a sharp reduction in the RMSE once at least 5 nyms are 
used. Of course this is not surprising, but illustrates the ability of the BLC algorithm 
to correctly infer n 3 mi structure from the user-item rating data. 

Also shown in Figure 4b is the RMSE of the predicted ratings when classic matrix 
factorization is used (where P is fixed to be the n x n identity matrix). It can be seen 
that once the number of nyms used is 5 or greater, the RMSE of the BLC approach is 
consistently lower than that of the classic matrix factorization approach. That is, by 
taking advantage of the nym structure within the user ratings the BLC approach is 
able to make more accurate predictions since users sharing the same nym can leverage 
each others ratings. 

Figure 5a plots the RMSE of the BLC predicted ratings vs the sparsity of the ob¬ 
served ratings matrix and the number n of users. The data shown is the average over 
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Fig. 4. (a) 2D projection of the positions in the d — 4: dimensional latent space of the users (in blue) and 
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Fig. 5. Prediction RMSE vs. missing values ratio for BLC (a) with 5 groups and standard matrix factoriza¬ 
tion (b), for different number of users. 


100 randomly drawn datasets and error bars are omitted because their size is neg¬ 
ligible. It can be seen that as the number of users increases the prediction accuracy 
improves when the rating matrix is sparse. For n = 1000 or more users the prediction 
accuracy is insensitive to the fraction of missing values until this fraction exceeds 95 %. 
The poorer performance with smaller numbers of users is due to the reduced statisti¬ 
cal multiplexing: the probability that an item has no ratings by any member of a nym 
becomes significant for sparse matrices with only a small number of users. Conversely, 
when there are larger numbers of users the nym structure allows users sharing a nym 
to leverage each others ratings to improve prediction accuracy when the ratings data 
is sparse. 

Figure 5b plots the corresponding results when using the classic matrix factorization 
approach. It can be seen that for all numbers of users the prediction accuracy decreases 
as the observed ratings become more sparse. This approach cannot exploit the nym 
structure to improve predictions when the user data is sparse. 
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Fig. 6. RMSE of the predicted ratings vs. the standard deviation of users around the nym centres in the 
latent feature space, rt = 10 000 users, p = 5, m = 100, d = 4 and fraction of missing ratings 5 % and 90 %. 

5.3. Challenging the Nym Assumption 

In the previous section the users were tightly hunched around the nym centres in 
the latent feature space. We now relax this and explore the impact of increasing the 
standard deviation and so the spread around the centres. Figure 6 shows the RMSE of 
the predicted ratings vs. the spread of the users around the nym centres. The inset plot 
in Figure 6 shows a 2D projection of the positions in the latent space of the users (in 
hlue) and nyms (in red) for a standard deviation of 10 about the nym centres. It can he 
seen that the prediction error with BLC increases roughly linearly with the standard 
deviation. The prediction accuracy is also largely insensitive to the sparsity of the 
ratings. For comparison, Figure 6 also shows the performance when the BLC algorithm 
is augmented using the extensions described in Sections 7.1 and 7.2. These extensions 
introduce adaptation of the number of n 3 mis used, the number selected being shown in 
Figure 7. Note that as the standard deviation around the centres in the latent space 
increases we are degrading the original nym structure and so effectively increasing the 
number of groups and this is reflected in Figure 7. It can be seen that these extensions 
improve prediction performance, but do not change the qualitative behaviour, namely 
that the prediction error increases roughly linearly with the standard deviation around 
the centres in the latent space. 

Also shown in Figure 6 is the corresponding data for the classic matrix factorization 
approach. For smaller values of standard deviation, where the data has a strong nym 
structure, it can be seen that BLC outperforms the classic approach in terms of pre¬ 
diction accuracy. However, as the standard deviation becomes large, so that the nym 
structure in the ratings is washed out, the prediction accuracy of the classic approach 
is better than that of the BLC approach unless the extensions in Sections 7.1 and 7.2 
are used. 

6. MEMORY AND TIME FOOTPRINT 

To establish whether the proposed technique is practical and scalable, 100 random sce¬ 
narios were generated for a variety of combinations of n, m values using 5 clusters with 
Gaussian noise as explained in Section 5. BLC algorithm performance is measured on 
a single core of an Intel Xeon Processor E3-1270 v3, 3.50 GHz. The overhead necessary 
to employ the per-routine measuring of memory and time consumption is significant, 
affecting the measured values by up to one order of magnitude. Nevertheless, this in- 
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Fig. 7. Boxplot and median showing the number of nyms corresponding to the BLCiocai data in Figure 6. 


strumentation is important for understanding how memory and time consumption are 
affected by the different parameters of the algorithm. 

In Figure 8 an analysis of the time and memory footprint vs n and m is shown. The 
error bars are not shown when the standard deviation is negligible. While memory 
usage increases linearly with both number of users and items, the convergence time 
increases linearly only with the number items, and remains roughly constant as the 
number of users is varied. 

Importantly, we also analyse the scalability and performance when employing a 
warm start, i.e. the re-convergence performance with new but similar data drawn from 
the same population after initial convergence. In Figure 9(a) the convergence time to 
adjust to new data is analysed. The time to factorize a new group of users grows lin¬ 
early with the number of users or items, but it is reasonably small (6 s for 1000 users 
and 500 items). Figure 9(b) shows the median time to run a matrix factorisation when 
up to 1 % of new users are added or when 1 % of existing ones change their rating: 
at worst the time to re-factorise is only around 60 ms. Regarding scalability, from Fig¬ 
ure 10 it is clear that parallelisation has the potential to bring substantial benefits, for 
example if the factorization could be run in parallel client-side, since the convergence 
time per-user tends to a constant value. 


7. EXTENSIONS TO BLC ALGORITHM 
7.1. Selecting Number of Nyms 

While BLC is designed to converge starting from any configuration of nyms in the 
latent space, in practice it may converge to a local optimum where some nyms stay 
unused or where two nyms could be coalesced into one to improve performance. We 
have found that a useful approach is therefore to initialise BLC with only 1 nym, 
then after convergence we double the number of nyms, adding Gaussian noise to the 
new nym feature vectors with standard deviation equal to half of the minimum inter- 
nym distance. At each doubling, nyms that are unused are pruned. In this way we 
essentially encourage users to progressively split into smaller groups where this leads 
to a performance improvement. We repeat this procedure until the factorization error 
falls below a threshold. 
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Fig. 8. Convergence time and memory usage vs number of users and number of items, for a synthetic 
scenario with 5 clusters. 




(a) (h) 

Fig. 9. Convergence time for (a) a full iteration after warm start, and (h) a single factorisation after up to 
1 percent of ratings change. 
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Fig. 10. Convergence time normalised by number of users vs number of users. 

7.2. Local Recommendation 

When users have access to the item profiles V, this allows increased flexibility in the 
way in which user ratings are predicted. For example, one approach is to use the pre¬ 
dicted ratings {UPuf^V for the nym selected hy user u. Alternatively, user u might 
exploit the information in their sole possession, i.e. the knowledge of their own private 
rating, to find the point x in the latent space that minimises J2vev{u)i^^,v — x^Vv)"^, 
given V and the ratings Ru.v(u) already made hy user u. This least squares optimi¬ 
sation can he solved locally hy the user on their own computer, and so in a private 
manner. When the user has only rated a small number of items, this prediction is 
likely to be worse than the nym-based recommendation. However, we can combine the 
best from both worlds by solving a joint weighted least squared problem [Draper et al. 
1966], in which user u locates their position in the latent space that also minimise the 
distance from C/. g, where g is the n 3 mi used by user u. More formally, 

where is a diagonal k x k matrix and w, a\ are design parameters. Then the recom¬ 
mendation is computed hy xV. We will refer to this enhanced algorithm by BLCiocai- 
We test this technique in section 8. 

7.3. Factorization Frequency 

As illustrated in Figure 3, Algorithm 1 is partitioned into a private component run by 
each user u e U and a public system component. Given (nym,rating) pairs supplied 
by the users the system component calculates nym rating matrix R and factorizes it 
into U,V using an estimate of A( ). The user component run by user u selects the 
nym to be used by user u using the current value of U, V. Rather than factorizing R 
every time a new rating is supplied or a user changes nym, the system workload can be 
reduced by carrying out the factorization less frequently. Figure 11 plots the prediction 
RMSE vs. the factorization frequency and the number of users n over 500 samples, 
using the same parameters as in the other simulations (but without Gaussian noise). 
The update frequency is specified as the fraction of users that have executed their 
private part of Algorithm 1 between each factorization, and a subset of users performs 
updates between each matrix factorization (user ordering is chosen from a permutation 
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Fig. 11. Prediction RMSE vs. factorization frequency. 


Table I. Dataset parameters. 


Dataset Users Items Ratings Density Domain 


Jester 

73421 

100 

4.1 M 

0.5584 

(-10,10) 

Movielens 

71567 

10 681 

10 M 

0.0131 

(1.5) 

Dating 

135 359 

168 791 

17.3M 

0.0007 

(1,10) 

Books 

278 858 

271379 

I.IM 

0.000 01 

(1,10) 

Netflix 

17 770 

480189 

1.4M 

0.0001 

(1,5) 


drawn uniformly at random for every execution of Block 2 of Algorithm 1). It can be 
seen that, perhaps unsurprisingly, when we re-factorize after every individual user 
update we obtain the best performance, and performance decreases as the factorization 
frequency decreases. That is, there is a trade-off between computational effort and 
prediction accuracy. However, as the number of users n increases it can be seen that 
the performance cost becomes less, presumably because the absolute number of user 
updates between factorizations is increasing, and for n > 1000 users a factorization 
period of 10 % seems reasonable. 

7.4. Cold Start 

In all of the experiments, we simulate the problem of cold start, by starting with no 
ratings and gradually filling the set O of observed ratings one user at a time. We cycle 
through a random permutation of the users until O contains the full training set. Every 
time a new user is added, all of their ratings are added to O. After the first permutation 
of users is completed, every time a user is selected it will potentially change R, P and 
A( ), moving all its ratings to a different nym. 

8. PERFORMANCE WITH REAL DATA 

8.1. Jester, Movielens, Dating & Books Datasets 

We follow the workflow and use the results of [Kannan et al. 2014]: the algorithms com¬ 
pared are ALSWR (alternating least squares with regularisation) [Zhou et al. 2008], 
SGD [Funk 2006], SVD-i-i- [Keren 2008], Bias-SVD [Keren 2008] and BMF [Kannan 
et al. 2014] on the datasets summarised in Table I. 

For each dataset we used the latent space dimension d that yielded the best RMSE 
in [Kannan et al. 2014], and used the same proportional split between training, valida¬ 
tion and test data (85 %, 5 % and 10 % respectively) and the same validation technique 
(subsampling validation). We include the Book dataset for completeness of compari¬ 
son, even if it is a sort of "degenerate" dataset, so sparse that using the global book 
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Prediction RMSE vs number of nyms used compared with state-of-the-art algorithms. 


average has been proven to be more effective that the state-of-the-art collaborative fil¬ 
tering techniques. We stress that in the following only the BMF algorithm has optimal 
performance because the other algorithms have many other parameters (other than d) 
that could be optimised, but that have been kept at their default values by the authors 
of the comparison. We did not tune the BLC regularisation parameters, instead keep¬ 
ing the variances of the priors large (1000) and rel 3 dng on the averaging between the 
users sharing the same nym to control overfitting. 

Figure 12 shows the RMSE of BLC vs. the number of nyms used and Table II sum¬ 
marises the RMSE values. It can be seen that BLC and BLCiocai are competitive with 
the state of the art, with only minimal tuning (namely of the number of nyms). That 
is, privacy-enhanced recommendation need not come at the cost of reduced prediction 
accuracy. 

Note also that only a small number of n 3 mis is needed: fewer than 20 nyms gives good 
performance for all datasets. That is, real data does indeed appear to possess a group 
structure of the kind we have assumed. Further, the small number of nyms means that 
we can expect that each nym is shared by a large number of users thereby providing a 
strong hiding in the crowd form of privacy - this is discussed in more detail below. 
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Table II. Summary of the RMSE performance for the various algorithms 
and datasets considered, using validation sets from [Kannan et al. 2014], 
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Dataset 

BMP 

ALSWR 

SVD++ 

SGD 

Bias 

SVD 

BLC 

BLC 

local 

(nyms) 

Jester 

4.33 

5.64 

5.54 

5.72 

5.82 

4.30 

4.20 

64 

Movielens 

0.85 

1.51 

1.42 

1.24 

1.23 

0.87 

0.83 

26 

Dating 

1.93 

4.72 

4.68 

5.17 

3.96 

1.91 

1.88 

14 

Books 

1.94 

4.71 

4.73 

5.18 

3.95 

1.96 

1.87 

1 


8.2. Another Comparison for Movielens 10M 

We now compare BLC with another family of algorithms, based on nuclear norm regu- 
larisation, for the Movielens lOM dataset. The algorithms are as follows: 

JSH. An extension of the Frank-Wolfe algorithm for optimising a function over the 
hounded positive semidefinite cone [Jaggi et al. 2010]. 

Soft-Impute. A soft singular value thresholding algorithm [Mazumder et al. 2010]. 
SSGD-Matrix-Completion. A more advanced algorithm based on stochastic subgradi¬ 
ent descent [Avron et al. 2012]. 

GECO. A greedy method with optimality guarantees for low rank matrix factoriza¬ 
tion [Shalev-Shwartz et al. 2011]. 

For all of the algorithms, we report the RMSE from [Avron et al. 2012] in Table III. 
Also shown in this table is the median RMSE of BLC over 5 samples, using the same 
parameters of before and converging at 8 nyms. It can be seen that, once again, the 
prediction accuracy of BLC is competitive with, if not superior to, the state of the art. 


Table III. Performance comparison of algorithms for the 
Movielens 10M dataset, using validation sets from [Avron 
et al. 2012]. 


SSGD JSH Soft-Impute GECO BLC BLC,„„a, 

0.8555 0.8640 0.8605 0.8771 0.8720 0.8452 


8.3. Netflix Dataset 

Our evaluation would not be complete without showing results for the classic Netflix 
challenge datasets, for which all the algorithms listed above have been initially de¬ 
signed. Unfortunately, the Netflix validation set is no longer available. Following [Kan¬ 
nan et al. 2014], we therefore tested BLC against the validation set that was supplied 
as part of the training package (using the same 1.4 M ratings training set described in 
Table I). As before, we used the latent space dimension d = 20 that yielded the best 
RMSE for the other algorithms in [Kannan et al. 2014] and we did not tune BLC pa¬ 
rameters. It can be seen from Table IV that BLC achieves the second best performance, 
using 128 nyms. 


Table IV. Performance comparison of algorithms for the Netflix dataset, 
BLC paramters were untuned. 


BMP ALSWE SVD++ SGD Bias-SVD Time-svd BLC BLCi„ai 
0.9533 1.5663 1.5453 1.2997 1.3882 1.1829 0.9875 0.9785 
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Fig. 13. RMSE vs number of nyms used for Netflix dataset. 


In Figure 13, we compare BLC with the classic matrix factorization algorithm for a 
dense (26 %) subset of the Netflix dataset, obtained by selecting the top 1000 items and 
the top 10000 users that rated them. These results are useful to further confirm that 
use of a small number p = 100 of nyms is sufficient to match the classic matrix fac¬ 
torization approach on this dataset, and for a larger number of nyms the performance 
surpasses that of the classic approach. 


9. PRIVACY PERFORMANCE 

As discussed in Section 3, the attacks of interest can be decomposed into two types: (i) 
attacks which seek to learn which nym a user belongs to, and (ii) attacks which seek 
to learn which items a user has rated, and what that rating is, given knowledge of the 
nym to which the user belongs. 

With regard to (i), lacking additional side information the attacker can try to guess 
the correct nym using the information in A, i.e. exploiting the fact that some nyms 
might contain more users than others: the prevalent nym is the best bet for the at¬ 
tacker, and the more the user-nym distribution is far from the uniform distribution, 
the more chance the attacker has to correctly guess the nym used by an arbitrary user. 
In the worst case for the attacker (uniform distribution) the user can get an indistin- 
guishability probability of 1/p, where p is the number of nyms i.e. we have a form of 
fc-anonymity [Sweeney 2002] with A: = p. In general, the probability Pg of guessing 
the right nym is equal to the number of users in the largest nym divided by the total 
number of users. Table V plots Pg for each of the datasets considered in the previous 
section, with Figure 14 showing more detail for the Movielens dataset. For the online 
book dataset BLC converged with 1 nym, leading to a probability of guessing Pg of 
100 %. But it is interesting to note that if we let the algorithm run until it is using 8 
nyms, Pg decreases to 15.24% with a mere increase of 1.2% in the prediction RMSE. 
All the other datasets have a Pg of less than 23 %. The use of nyms therefore provides 
a reasonable level of protection against this first type of attack. 

We now consider the second type of attack of interest, where an attacker seeks to 
learn which items a user has rated given knowledge of the nym to which the user 
belongs. We define the following privacy measure: 


p{nj) 
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Table V. Indistinguishability performance for BLC algorithm: number of 
nyms used and probability Pg of guessing which nym contains a target 
user. 


Dataset 

Jester Movielens 

[Avron et al. 2012] 

Dating 

Book 

Netflix 

no. nyms 

64 

8 

14 

1(8) 

128 

PA%] 

3.5 

22.17 

19.4 

100 (23.21) 

2.71 



1 2 3 4 5 6 7 8 

Nyin 


Fig. 14. Number of users per nym in Movielens dataset [Avron et al. 2012]. 

where n is the nym index, j the item index, A{j)nn the number of users in n 3 rm n who 
have rated item j and A(w)nn is the total number of users sharing nym n. It can be 

vev 

seen that p{n,j), which we refer to as the association probability, is an estimate of the 
probability that a user rated movie j, given that the user chose nym n. Hence, smaller 
values of p{n,j) correspond to increased privacy in the sense that it is harder for an 
attacker to learn that a specific user in nym n rated item j, whereas when p{n,j) = 1 
then every user in n 3 mi n has rated item j and so an attacker can be certain that a 
target user in nym n has rated that item. Intuitively, we expect that as the number 
of users increases then p{n, j) will tend to decrease and so privacy increase. Figure 15 
shows the distribution over all the nyms of the association probability of the worst 
item. As expected, it can be seen that there is a downward trend with the number of 
users for both the Movielens and online dating datasets. When the number of users is 
more than around 10% of that in the dataset then the association probability is always 
less than 0.3 i.e. each user can deny that they rated any given item with probability 
at least 0.7 even when an attacker knows which n 3 mi the user belongs to. This seems 
like a substantial level of deniability, sufficient for most practical purposes. Figure 16 
shows the impact of increasing the number of nyms used. It can be seen that using 
more nyms improves prediction accuracy but reduces the level of privacy, as might be 
expected. However, the reduction in privacy is relatively small, with the association 
probability remaining consistently below 0.3. 

Compared to the other algorithms considered in Section 8, BLC leads to an associa¬ 
tion probability decrease of at least 70% in all cases. 

10. CONCLUSIONS 

We propose a privacy-enhanced matrix factorization recommender that exploits the 
fact that users can often be grouped together by interest. This allows a form of “hiding 
in the crowd” privacy. We introduce a novel matrix factorization approach suited to 
making recommendations in a shared group (or nym) setting and the BLC algorithm 
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Fig. 15. Boxplot of the association probability of the worst item vs. the number of users in the system for 
the Movielens and Online dating datasets when using BLC. 
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Fig. 16. Association probability (left y axis) of worst item and RMSE (right y axis) of BLC for Movielens (a) 
and Online dating (b) datasets. The distribution of the worst item association probability over the nyms is 
shown as a boxplot (left y axis) and the RMSE of the prediction is shown vs the number of nyms used by 
BLC. 


for carrying out this matrix factorization in a privacy-enhanced manner. We demon¬ 
strate that the increased privacy does not come at the cost of reduced recommendation 
accuracy since, when nyms are chosen appropriately, users sharing a nym can leverage 
their shared ratings/preferences to make high quality recommendations. 
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Proof of Theorem 4.3. We begin by observing that trU U is the Frobenius 
norm of U, and thus is convex in Uu for all u and similarly trV^V is convex in Vy 

for all V. We also have that — U — f/ V^) is convex in Vy (it 

is the usual least squares objective in Vy). Since the Hessian of this function w.r.t. 
Uy is positive definite for all u, we also have that it is convex in Uy. It follows that 
— \ogp{U,V\TZo;P) is individually convex in U and V. 

Now consider the sequence k = 1,2,... generated by alternating up¬ 

dates 

= Y. A(«)99Rg„(lz('‘))^ -I- y; A(™)9g(vP^>V<j’))^ 

9€V {‘’0 ™EV 

By Theorem 4.2, is a stationary point of - logp(-,V’'^^ [T^o; P) and by convexity it 
is therefore a global minimum of — \ogp{U, V\TZo; P) when V and P are held fixed. 
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Similarly, is a stationary point and so global minimum of —\ogp{U,-\Ro\P) 
when U and P are held fixed. It follows that the sequence of log posteriors satisfies 

-logp(i7^''y(")|7^o;P) > -logp(p'"y |Po;P) > - logp(i7^"+'\ [Po; P), 

A: = 1, 2, , that is, it is descending. 

When U, V and Pu are held fixed for all u ^ u, the update (11) for variable u is 
a global minimum of — logp(t/, W[72.e); P) and, by Lemma 4.1, a global minimum of 

— logp(P, V\7lo, P). It follows that sequence — logp(i/^^\ P^^), /c = 1,2,... is 

also decreasing. 

Matrix P^ is, by construction, (0,1) valued and so uniformly bounded for all k. As¬ 
suming that U , also remain uniformly bounded for all k then by the monotone 

convergence of bounded sequences (e.g. [Billingsley 2008, Theorem 16.2]), sequence 

log]3(P*'^\ P) converges to a finite limit as fc —> oo. Let logpoo denote this 

limit and let Coo denote the corresponding set of limit points. 

It remains to show that every limit point is a stationary point for U and V and 
a local minimum for P. For 17 and V this follows from the alternating update and 
Theorem 4.2. □ 
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